Moses: Open Source Toolkit for Statistical Machine Translation

نویسندگان

  • Philipp Koehn
  • Hieu Hoang
  • Alexandra Birch
  • Chris Callison-Burch
  • Marcello Federico
  • Nicola Bertoldi
  • Brooke Cowan
  • Wade Shen
  • Christine Moran
  • Richard Zens
  • Chris Dyer
  • Ondrej Bojar
  • Alexandra Constantin
  • Evan Herbst
چکیده

We describe an open-source toolkit for statistical machine translation whose novel contributions are (a) support for linguistically motivated factors, (b) confusion network decoding, and (c) efficient data formats for translation models and language models. In addition to the SMT decoder, the toolkit also includes a wide variety of tools for training, tuning and applying the system to many translation tasks.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Phrasal: A Toolkit for New Directions in Statistical Machine Translation

We present a new version of Phrasal, an open-source toolkit for statistical phrasebased machine translation. This revision includes features that support emerging research trends such as (a) tuning with large feature sets, (b) tuning on large datasets like the bitext, and (c) web-based interactive machine translation. A direct comparison with Moses shows favorable results in terms of decoding s...

متن کامل

HKUST statistical machine translation experiments for IWSLT 2007

This paper describes the HKUST experiments in the IWSLT 2007 evaluation campaign on spoken language translation. Our primary objective was to compare the open-source phrase-based statistical machine translation toolkit Moses against Pharaoh. We focused on Chinese to English translation, but we also report results on the Arabic to English, Italian to English, and Japanese to English tasks.

متن کامل

A unified framework for phrase-based, hierarchical, and syntax-based statistical machine translation

Despite many differences between phrase-based, hierarchical, and syntax-based translation models, their training and testing pipelines are strikingly similar. Drawing on this fact, we extend the Moses toolkit to implement hierarchical and syntactic models, making it the first open source toolkit with end-to-end support for all three of these popular models in a single package. This extension su...

متن کامل

Data selection and smoothing in an open-source system for the 2008 NIST machine translation evaluation

This paper gives a detailed description of a statistical machine translation system developed for the 2008 NIST open MT evaluation. The system is based on the open source toolkit Moses with extensions for language model rescoring in a second pass. Significant improvements were obtained with data selection methods for the language and translation model. An improvement of more than 1 point BLEU o...

متن کامل

IRSTLM: an open source toolkit for handling large scale language models

Research in speech recognition and machine translation is boosting the use of large scale n-gram language models. We present an open source toolkit that permits to efficiently handle language models with billions of n-grams on conventional machines. The IRSTLM toolkit supports distribution of ngram collection and smoothing over a computer cluster, language model compression through probability ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007